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Abstract 


Complex AM and FM signal models can be used for representation of non 
-stationary signals such as speech [1,2,3]. Complex AM signal model has 
been found to be suitable for sustained voiced speech phonemes [1,3], while 
Complex FM signal model can be used for representation of sustained 
unvoiced speech phonemes [2,3]. But this type of classification is not 
appropriate, in this study it is shown that the phonemes having most of 
their energy in low frequency region can be fitted by complex AM model 
, while for those having most of their energy in high frequency region 
Complex FM model is suitable. Also in sustained vowel and consonant 
sounds the gain of the signal is constant so Complex AM and FM models 
can directly give the parameters, but in natural spoken speech signal the 
gain varies with time .This study considers the time-varying nature of 
speech signal gain and explains principle of parameter estimation by these 
two models by making gain of speech signal constant .Time varying gain of 
the speech is estimated and fitted by polynomial model .The parameters of 
constant gain speech signal and coefficients of polynomial are coded. 
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Chapter 1 


Introduction 


One approach to the problem of representation of speech signal is to 
use tlie speech production model in which speech is viewed as the result of 
passing a glottal excitation waveform through a time-varying linear filter 
that models the resonant characteristics of the vocal tract. Jt is assumed that 
glottal excitation can be in one of two possible states quasi-periodic pulses 
(during voiced speech), or random noise (during unvoiced speech). In this 
model the basic speech parameters, e.g., pitch, formants, spectra, vocal 
tract area functions etc. are estimated and coded for speech compression. 

In above model the notion of quasi-sfationarity is applied, and the 
identification of model is performed over short data segments [12,13]. In 
this case, a compromise is needed between the faithfulness of the model in 
representing the details of signal and accuracy of estimation of the 
parameters of the model. 

Low rate speech coding by sinusoidal model has been addressed [12] in 
this model the glottal excitation is modeled by sinusoidal components 
arbitrary amplitudes, frequencies and phases. This model is also frame 
based, each frame is of 10 to 20 ms. And generally the length of one 
phoneme is above 40 ms. Which means that for a single phoneme one has to 



take 4 or 5 frames, in other words 4 or 5 parameter sets are to be taken for 
one phoneme in genera). 

Multicomponent AM and FM signal models are developed for representation 
of voiced and unvoiced speech phonemes respectively [1,2,3]. 

To estimate the parameters of the multicomponent complex AM signal 
model, the accumulated autocorrelation function (AACF) of speech data is 
computed by taking the sum of time dependent autocorreation functions 
(ACFs) over an assigned time frame. The PSD plot of AACF is then used to 
obtain carrier and modulating frequencies. Once the frequencies are known 
the amplitude and modulation index parameters of the fitted mode! can be 
found by solving linear estimation problem, [1] 

The above technique of frequency estimation was not suitable for fast on- 
line processing of data .So a new technique, which involves fitting the 
AACF sequence in a linear prediction model, was demonstrated. The Zeros 
of the prediction error filter (PEF) are used to estimate the carrier and 
modulating frequency [3]. This enabled the fast on-line automatic 
processing of speech data. 

For estimation of frequency parameters of the multicomponent complex FM 
signal model, a sequence of product functions of the signal samples is 
computed and then processed to obtain the autoregressive PSD of 
underlying process. This PSD plot is studied in conjunction with DFT plot 
of the data to obtain carrier and modulating frequencies. Modulation indices 
are obtained from magnitude plot of DFT. the remaining parameters can be 
estimated by linear estimation [2] 



The above technique was again not suitable for on-line processing of data 
Moreover, determination of modulation indices from magnitude plot of DFT 
involves separation of plot into its individual components, which makes the 
parameter estimation problem more difficult. So a new technique which 
estimates the frequencies by linear production model was demonstrated [3] 
But in all these methods explained in [1,2,3] of parameter estimation and 
model fitting of speech data the time-varying gain or variance of speech is 
not considered and model is fitted for sustained vowel and consonant 
sounds. 

In this study Time-varying gain of speech is estimated and the original 
speech signal is divided by this estimated gain to make resulting signal of 
constant gain. Then complex AM or FM models are applied on this constant 
gain speech, to estimate parameters. 

Gain function is fitted by polynomial model and coefficients of polynomial 
are estimated in least square sense. The speech parameters and coefficients 
of gain are coded to check the potential of speech compression by models 
proposed in [1,2,3]. 



Chapter 2 

Complex AM Signal Model 

2.1 The Model 

In this section the complex AM signal model as proposed and explained 
in [1] is introduced. The discrete-time complex random process x[n] 
consisting of M signal-tone amplitude modulated signal is represented by 
M . , 

x(«) = £ A [l + M,- s'"' 

/=1 ^ ^ 

Where Af is the carrier amplitude of constituent signal, }i, is the modulation 

index, ct), is the carrier angular frequency, v, is the is the modulating angular 

frequency,!^, is the independent and identically distributed (i.i.d) random 

phase, and T is the sampling interval. It is assumed that the random variable 
is uniformly distributed over [o, 27 c]. 

The complex sequence x{n) may be utilized to express, the time- 
dependent autocorrelation function (ACF) 

r^{n,k) = £{;c*(«)x(« + /c)} (2.2) 



where ' E stands for the expectation operator, and * denotes complex 
conjugation. 

Substitution for x*(n) and x{n + k), and subsequent evaluation of 
the right hand side of Eqn. (2.2) yields 


/=! 


(2.3) 


Where the following result of expectation is utilized: - 




1 . for (f>i=<f>f 

0 for ^if^i 


The accumulated autocorrelation function (AACF) C^{k) is computed 
by taking the sum of ACF, r^{yi,k) over a fixed time frame [n,,/i 2 ], 


c,(*)= Z ’■,{n,k) 


(2.4) 


Evaluating the right hand-side of Eqn, . (2. 4) one gets 

M M 


(VI m 

C.{k) = X + Z 


(2.5) 


l = \ 


Where 


5, = a; 


"2 

(ill - «i + 0 + z ® 


Jv, iiT 


)i~n. 


and 



C j ~ A ^ }j. j 


AI /(«2 - «1 + 0 + Z 


Eqn. (2.5) rewritten provides the final expression for the time-independent 
" AACF C^{k) as follows: - 

2 M 

c.(0= Z (2fi) 

i=>i 

where 



for i — 1 M 

for l = M+l,...,2M 


and 


fci); for j = l M 

+v,_^ for i=M 


It is explicit from Eqn. (2.6) that the AACF C,(/c) comprises of 
sinusoids with angular frequencies same as the carrier angular frequencies 
for the modeled signal x(n). 
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2.2 Model fitting and Parameter estimation 


Starting with the sampled sequence [1] { = “1 }» which is to 

be modeled, one first computes the set of AACFs {C^(A:); k ~ - J j 

given by 

Z *(«);"(« + ^) (2.7) 

where the length of the time frame is taken as +1 - A -2J .Note 

that the sequence >*(«) may be considered as a single observation or sample 
of the discrete-time random process x{ii). Furthermore, since the concept of 
ergodicily does not apply for non-stationary signals, one has no choice but 
to drop the expectation operator from Eqn. (2.7). 

The sequence C^ik) is fitted into a linear prediction model of AR (p) 

process to obtain a prediction error filter (PEF) defined as follows: - 

A{z)= a[\\z~^ + a[p]z~^ (2.8) 

where fl[l],a[2], ■••,«[/?] are the linear prediction coefficients (AR) parameters 
to be determined. Zeros of /l(z) will be used to determine the angular 
frequencies of the model. 



In order to estimate the AR parameters. Modified Covariance Method 
[4] has been employed. This method appears to yield statistically stable 
spectral estimates with high resolution. For any input x{^^) can be written 
in matrix form as follows; - 

'CM CJ12] - C«[l,/^]ira[l]l Kjl.Dj' 

C„(2,l] c„[2,2] CJ2,p] ^(21 , _ 

c„[p, 2 ] C„[p,3] C„tp.p]J [c„[t),q. 

where 

C„U,k) = --^ — rfEx*(«-y)%(«”^)+ ^x(n + j)x*{n+k) 

where N is the number of data points. 

The matrix in Eqn, (2.9) has been decomposed using Singular Value 
decomposition (SVD) technique [6,8]. Singular values which are 
comparatively small in magnitude are set to zero before obtaining a solution 
for linear equations given in Eqn. (2. 9) Zeroing of smaller singular values 
adds inherent noise immunity to estimate techniqiie[7]. Once the AR 
parameters are obtained, wc determine zeros of PBF definerl by Eqn. (2.8). 
From these zeros, the carrier and modulating angular frequencies of the 
signal x(/j) are obtained. 

Identification of modulating angular frequencies from the Zeros of 
PEF is necessary, before finding carrier and modulating angular 
frequencies. For this the residues J?,. are computed for all frequencies of the 
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AACF sequence. It has been shown [1] that for an unmodulated carrier, the 
residue is real, whereas the residues are complex for the modulated 
frequencies. This feature is employed to identify the unmodulated .carrier 
frequencies. Rests of the peaks are considered in pairs taking two adjacent 
peaks for each modulated carrier. Once all frequencies are known. The 
amplitude, phase and modulation-index parameters of the fitted model can 
then be obtained in the next stage by solving a linear estimation problem as 
presented in the following paragraph. 

Considering the inclusion of unmodulating carrier frequencies in the model, 
the random sequence x(«) can be modeled as 

Mr 1 . i 

a:(«) - (2.10) 

f=l /=,w+i 

where the (L-M) unmodulated carrier angular frequencies are include, and 
the last term of the above equation constitutes the stationary part of the 
discrete-time process. 

The discrctc-lime signal .v(«) is fitted into the complex AM signal model 
of Eqn. (2.1), which is rewritten as 

M M 

rW (2.11) 

i=l , ' .,i=l ' ■ ^ 

for n = 0, 
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where =A.e^^‘ is the unknown complex amplitude of the carrier, is the 
unknown modulation index, and ^^=6“'"'^,^,. =6''^ are the parameters which 
can be computed from the estimated co, and v, values. 

For (L-M) unmodulated frequencies hence , avoiding the 

ill conditioning, the equation (2.11), can be written as follows;- 

y{») ( 2 . 12 ) 

i=i /=i 

Written in matrix form, Eqn. (2.12) becomes 


J^(O) r 1 - 1 1 

^(2) = m! 


i 


)iN-\yN-\ 

‘sAf 



(2.13) 


which should be sufficiently overdetermined with N> L-\-M . Now equation 
(2.13) is to be solved in the least square sense to find the and p. 


parameters. The pseudo inverse of the matrix here is computed by Singular 
Value Decomposition technique. Note that Eqn. (2.13) presents a linear 
estimation problem, and consequently it can be stated that the amplitude, 
phase and modulation index parameters will be computed accurately 
provided the angular frequency parameters are known accurately. [1] 


to 



After the estimation of all the parameters a data set is regenerated with 
these parameters, which is then compared wjth original input signal using 
Spectral Distortion measure [5]. A mismatch can occur between the 
regenerated and input signal because of wrong choice of unmodulating 
carrier and modulating frequencies. In such a case a new may have to be 
made and estimation process repeated till a proper match between the input 
and regenerated signal is obtained, p] 

2.3 Simulation Study 

In this section, computer synthesized is considered for the 
study of model fitting. 

The complex signal y(n) consisting of two single lonc-amplitiidc 

modulated signals and one unmodulated carrier signal is sampled at N = SQ0 

points. Three sets of parameters in Eqn. (2.1) are chosen as follows:- 

=1.00, 0), =20, V, = 1, =2, p, =0.50, 

4 = 0.50, 0)2 = 40, Vj = 0, ^2 = 3, P2 = 0-00, 

/Ij = 0.75, £03 = 60, Vj = 3, (|>3 = 1 .5, /lij = 0.65, 

and the sampling interval used is T = 0.01 units of time. Utilizing Eqn, 
(2.7) with «,=/ and fi 2 =N -I- J, the AACF sequence C,.(.t) for 
k = ~ J,'‘',0,*--J is computed. It has been found empiricaliy that for the best 
results value of V should be chosen as N/3 or 2N/5 [1,3]. From the 
computed sequence of the AACF the AR parameters of the PEF are found 

. n ■ 



using modified covariance technique as given by Eqn. (2.9). Model order 
chosen is high enough (above 100) so that no frequency peak is missed out 
even in presence of high noise. Then the zeros of PEF are computed. Only 
those zeros, which are on or near the unit circle, are considered for further 
processing. As a first step the residues pertaining to the chosen frequencies 
(zeros in this case) are found. Using these residues we find the unmodulated 
carrier and modulating frequencies as discussed in section (2.2). Other 
parameters are determined using linear least square technique. True and 
estimated values of parameters along with percentage errors are listed in 
table (2.1). Residues for various frequencies are listed in table (2.2). 




Table 2.1: Estimated values of complex AM model parameters for 

synthesized data 



BBB 

Estimatec 


^S9&39lllllliHili^ 

Parameters 


No noise 

SNR=30db 

SNR=20db 

£0, 

20.000 

20.000(0.00) 

20.001(0.00) 

20.003(0.01) 


Hjgujljl 

40.000(0.00) 

40.002(0.01) 

40.007(0.01) 

CU3 

60.000 

60.000(0.00) 

59.999(0.00) 

59.993(0.01) 

Vi 

1.000 

1.000(0.00) 

1.001(0.12) 

1.024(2.40) 

2 

3.000 

3.000(0.00) 

3.001(0.02) 

2.999(0.03) 

A, 

1.000 

1.000(0.00) 

0.997(0.30) 

0.979(2.10) 


0.500 

0.500(0.00) 

0.498(0.40) 

0.497(0.15) 

A 

0.750 

0.750(0.00) 

0.752(0.15) 

0.753((0.26) 


2.000 

2.000(0.00) 

1,992(0.40) 

1.975(1.25) 

<f>2 

3.000 

3.000(0.00) 

2.999(0.03) 

2.959(1.36) 

^3 

i 

1.500 

1.500(0.00) 

1.497(0.20) 

1.485(1.00) 

1^, 

0.500 

0.500(0.00) 

0.503(0.60) 

0.521(4.2) 

Ml 

- 

' 

- 

- 

^3 

0.650 

0.650(0.00) 



0.652(0.13) 

i - . 

0.648((0.30) 






















Table 2.2: List of Frequencies and corresponding residues of complex AM 

model for synthesized data 

(a): For 20db SNR 


# 

Freq, 

Residue 

1 

63.992 

53.06 + j8.90 

2 

59.993 

96.11 - j9.67 

3 

40.007 

oo 

0 

1 

4 

21.027 

-8.40 + j52.74 

5 

20.003 

109.83 - 

6 

16.58 

46.97 

7 

268.4 

0.19 + j0.04 

0.10 + jO.Ol 


(b): For 30db SNR 


n 

Freq. 

Residue 

1 

20.001 

110,37 -j50.98 

2 

21.002 

-7.72 - j55.65 

3 

40.002 

58.02 - j3.60 

4 

96.99 

• 0 

5 

59.999 

103.91 - 11.22 

6 

63.000 

51.97 -f-jlO.15 


(c) for zero noise 


# 

Freq. 

Residue 

1 

21.00 

1 10.33 - 50.82 

2 

20.00 

-7.65 - j55.32 

3 

40.00 

58.59 - j3.67 

4 

52.84 

0 

5 

60.00 

103.87 -j 10.93 

6 

63,000 

51.92+ j 10.09 





Chapter 3 

Complex FM Signal Model 


3.1 The Model 

It has been shown [2] that Complex FM Signal Model can be used 
for representation of non-stationary signals like unvoiced speech phonemes. 
The complex random sequence x{n) consisting of M single tone frequency 
modulated subsequences is represented by 


M 

x(^n) = ^ y5(. 


(=1 


(3.1) 


where 

Ai is. the amplitude of complex exponential carrier signal, 

0 ), is the discrete carrier angular frequency, 

j3, is the modulation index for sinusoidal modulating signal, 

V, is the discrete modulating angular frequency, 

T is the sampling interval, 

(pi is the random phase, independent and identically distributed. 

The random phase is assumed to be uniformly distributed over 
[0,2;r]. It has been shown [2] that unlike Complex AM Signal Model case, 
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Time-Varying Autocorrelaion Function (ACF) can not be conveniently 
utilized in this case, for determining the carrier and modulating angular 
frequencies because of its double dependence on time and lag. 

Instead another function called Product Function has been utilized for 
this purpose [3]. The product function p^{k) is defined as 


evaluating the expression (3.2) for sum ofFM signals x{n), one gets 

^ ^jl/],sin(v,TA/2)+/3,sin(v,TA/2)] 

In order to investigate the characteristics of product 
function p_^(k) , we consider a specific case of M==2. Then the above 

expression for p^{k) after simplification [2,3] can be written as follows: - 

PAk) = Af 

m|=-oo ^ 

/«2 « -oo 

where 

It is clear from the equation that the spectrum of the sequence /j^(A:) 
will contain peaks at twice the carrier angular frequency and the sum of 
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M M 


;’,W=EE 


H f *t 



carrier angular frequencies, with an infinite number of side peaks located 
symmetrically on either side of the main peaks. The clusters ccnlcrcd at 
twice the carrier frequencies are like individual FM signals. 


3.2 Model Fitting and Parameter Estimation 


For the sum of complex FM signals x(h), a sequence of product 
function p^{k) is computed. This sequence is then fitted into a linear 

prediction model and zeros of the PEF [4] are computed as in the complex 
AM signal model case. Theses zeros indicate the frequencies contained in 
the signal and are processed further to obtain carrier and modulating 
frequencies of the model. 

As brought out in section (3.1), the product function contains a very 
large number of frequencies. Therefore, the AR model order in this case has 
to be chosen as a very high value (greater than 150) to ensure that 
significant peaks are not missed out. Identifying and separating carrier 
angular and modulating frequency is a very difficult task. For this, 
frequencies obtained by the zeros of PEF, are sorted in ascending order and 
difference (subtracting lower frequency from just higher one) of two 
consecutive frequencies is tabulated. By this table a cluster can be 
identified, and with the knowledge that eaciv cluster is centered at twice the 

n 



carrier frequency and both sides of this center are frequencies which are 
sum of twice the carrier frequency and multiples of modiilaling frequency, 
carrier and modulating frequencies can be determined from their respective 
clusters. 

The sum of carrier frequencies also makes a cluster and this can be 
confused with unmodulated carrier frequency so whenever this type of 
conflict arises, hit and trial method is used. 

DFT plot of data x(n) may also have to be used as an aid for 
accomplishing the above task. Isolated peaks in the spectrum are defined as 
unmodulated carrier frequencies. 

Once the carrier and modulating frequencies have been determined next 
problem is to determine respective modulation indices and other 
parameters. We can write Eqn. (3.1) as 

/;| = -00 

where Ac , = A. e is the complex amplitude and is the Bessel 

Function of the first kind of integer order m and argument jS,. [10]. 

It can be seen from above equation that each of the FM subsignals of 
xin) with non-zero fij would contain an infinite number of side frequencies. 

But it is known [9] that for a FM signal with as high as 30; the number of 
significant side frequencies is just 70. Therefore we take a fixed number of 

side frequencies (say q) and determine their residues as follows; - 




~ g-Zdai-AV,) 

g72(w,-9V,) 

... 

T 

■ -4c,y_,(/3,) ■ 







' ^(1) “ 


g7(<»l+<7V,) 

gi2(*“l+7V,) 

,,, gjW((o+9V,) 



x(2) 

— 


• 

... . 


_x{N)_ 


^j{e>u~qvu) 

^J2(cOf,-qv^f) 

... -?»'«) 











Qj2{oyM-qvu) 

... 


1 (Pm) _ 


where q is suitably chosen so that above set of equations remains 
overdetermined. 


= are the residues for respective 


frequencies of the signal jc(n).lt is known [10] that 


Y.-'l (/ 3 ) = i 


(3.5) 


m = -« 


Therefore sum of squares of residues for each FM subsignal gives 
us complex amplitude for that signal as follows: - 

+ -’*+[/lc/o(/3,)f+;--+[A‘/^ (3.6) 


Knowing complex amplitude, we can determine from 

residues and hence the value of modulation index j3,. can be determined for 

each FM subsignal. It is to be noted that while performing squaring 
operation as indicated above, the phase information of the complex 




amplitude is lost. Hence we try for both positive and negative and negative 
values of complex amplitude and regenerate the residues after finding 
modulation index for each case. The values of complex amplitude and 
modulation index which regenerate the original residues are chosen. 

After estimation of all the parameters a data set is regenerated with 
theses parameters, which is then compared with the original input signal 
using Spectral Distortion Measure [5]. In case of mismatch, a new choice 
may to be made and estimation process repeated till a proper match between 
the input and regenerated signals is obtained. 


3.3 Simulation Study 

In this section, computer synthesized data is considered for model 
fitting by Complex FM signal model. 

The complex sequence n = 0, ±1, •••, ±A '/2 consisting of two single- 

tone frequency modulated and one unmodulated carrier subsignals is 
sampled at A = 801 points. The sets of parameters of model (3.1) are chosen 

as follows: - ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ 

/I, =1.50, £0, =60, V, = 2, >, =1.5, /3, = 0.40 
= 2.25, coj = 80, Vj =0, (^2 = 2.25, Pz = 0.00 

A, = 0.75, (U 3 =140, vj = 7,>3 = 1.00, =1-00 



and the sampling interval T = 0.01 units of time . 

Utilizing the sampled sequence, product function sequence {p,(A)} is 
computed using Eqn. (3.2) for A = 0, ±2, ±4, with Z, = 800 the 801 

point sequence is then fitted into a linear prediction model and AR 
coefficients are determined using modified covariance technique as 
discussed in chapter 2. Zeros of PEF are the various frequencies contained 
in the product function sequence. Then theses frequencies are arranged in 
ascending order and clusters are selected, as explained earlier each cluster 
will double carrier frequency at its center. Help of PSD plot of data is also 
taken to identify the unmodulated carrier frequency. After estimation of 
carrier and modulating frequency, other parameters modulation index, 
amplitude and phase are found out using Eqn. (3.4) and (3.6). The value of 
q is chosen to be 5, x(/i) is then mixed with varying degree of noise and 
estimation procedure is repeated. True and estimated values of various 
parameters along with the percentage errors are listed in tabic (3.1). 

Frequencies and difference frequencies are also shown to explain the 
identification of cluster and finding carrier and modulating frequencies. 



Table 3.1; Estimated values of complex FM model parameters for 

synthesized data 



True 

Estimated 

values {percental 

ge error) ' 

Parameters 

Value 

No noise 

SNR=30db 

SNR=20db 

CO, 

60.000 

60.000(0.00) 

60.005(0.00) 

60.000(0.00) 


80.000 

80.000(0.00) 

80.000(0.01) 

80.000(0.00) 

CO3 

140.00 

0 

140.000(0.00) 

140.000(0.00) 

140.000(0.00) 

V, 

rnMnOM 

2 , 000 ( 0 . 00 ) 

2 . 001 ( 0 . 12 ) 

2 . 12 ( 6 . 00 ) 

^ 2 

V 3 


7.000(0.00) 

7.000(0.00) 

7.000(0.00) 

A 

1.500 . 

1.500(0.00) 

1.501(0.07) 

1.445(3.60) 


1,250 

1.250(0.00) 

1.252(0.15) 

1.254(0.32) 

A 

0.750 

0.750(0.00) 

0.749(0.13) . 

0.747((0.40) 

A 

1.500 

1.500(0.00) 

1.497(0.22) 

1.496(0.26) 

<l>2 

2.250 

2.250(0.00) 

2.252(0.07) 

2.254(0.18) 

A 

1.000 

1 . 000 ( 0 . 00 ) 

1.005(0.48) 

1.015(1.51) 

A 

0.400 

0.400(0.00) 

0.40(0.00) 

0.43(7.5) 



- 

- 

- 

(h 

1.000 

1 . 000 ( 0 . 00 ) 

._.J 

1 . 000 ( 0 . 00 ) 

1 . 01 ( 1 . 00 ) 
























Chapter 4 


Speech Coding 


Parametric modeling by complex AM and FM signal models has been 
studied for sustained vowel and consonant sounds [3]. But in natural speech 
the signal is not steady i.e. the energy or gain of the speech varies with 
time. Hence the proposed models can not be applied for continuously 
spoken speech with varying gain. In this section the time varying nature of 
speech gain is considered and before appiying these models speech signal is 
made of constant gain. 


4.1 Estimation and Modeling of Gain 


Let s(/>) be the speech sequence with varying gain, consider another 
sequence y(n) such that, 


y{n) 


■?(») 


(4.1) 


where g(«) > 0, V/i . 

The quantity g(«) is the time varying gain of speech sequence s(n). Now the 
resulting sequence y(/i) is of constant gain or variance. It remains to show 
how one gets X/;) from s(n), in other words, how the gain factor g(n) is 



estimated for each n. As it is already assumed that g(«) evolves slowly 

with respect to time, one can estimate it as the local envelope or short time 
energy of s(«). [U] 

For a symmetric window 'w' of even length L, gin) can be estimated as, 


^(«) = ~Y,\H0sin + i-Lf2)\ 

^ /=! 


(4.2) 


for n 



The choice of window length L depends on the nature of the signal s(n) 
and that of the sequence gOO* For very small I, the averaging is not 

complete and the randomness of s(/i) is reflected in g(«) . On the other 

hand if L is too large, it fails to follow the local variation of the envelope. 


Here we' see that the length of the gain sequence g(«) is N — L, while that of 

A 

sequence s{n) is N, Due to this problem the division of s(/j) by g(n) is not 
possible. This difficulty is solved by adding zeros of length L/2 on both 
sides of sequence s(n). 

A ■ 

Now the gain sequence g(n) is fitted by polynomial model as follows: - 

;„) - +P 2 n"'-' + p,n"-^ + p,n + p,„, (4.5) 
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where m is the model order and are the polynomial coefficients. 


4.2 Estimation of Speech Parameters 

Before estimating the parameters one should know the time interval of 
each phoneme of continuously spoken speech. This phoneme separation is 
not the matter of this study, here it is assumed that we are already having 
various separated phonemes. 

Now each phoneme sequence is selected and gain is estimated for the 
total period of this phoneme sequence. After estimation of gain, phoneme 
sequence is divided by this gain for whole of its length. By this way we are 
now having constant gain phoneme sequence and complex AM or FM signal 
models can be applied on this. 

. It has been shown in [3] that complex AM signal model is suitable for 
vowel sounds while FM signal model is suitable for consonant sounds. But 
simulation studies show that this type of classification is not appropriate. 
Many consonant sounds can also be modeled by complex AM signal model. 
It is found by simulation studies that the phonemes having most of the 
energy in low frequency region can be modeled by complex AM signal 
model and the phonemes having most of the energy in high frequency 
region can be modeled by complex FM signal model. 



Frequency range of phoneme is seen by simple PSD plot. Though we know 
that the standard Fourier representations that are appropriate for periodic, 
transient, or stationary random signals are not directly applicable to the 
representation of the speech signal whose properties change markedly as a 
function of time. Yet to have just an idea of the frequency content of 
phoneme simple PSD plot of the phoneme is taken. After knowing which 
model to apply for each phoneme every phoneme can be modeled either by 
complex AM or by complex FM signal model as discussed in chapter 2 and 
3. 


4.3 Simulation Study 

In this section natural speech signal, used in TIMIT lexicon and 
phonetic transcription is taken. A sentence "don’t ask me to’’spoken by a 
women is chosen for model fitting and estimation of parameters. The 
phonetic details about- the sentence arc given in table (4.1) and (4.2). 
Speech data taken is sampled at 16 kHz. The signal sequence for each 
phoneme is normalized and made zero mean before model fitting. 

For each phoneme PSD plot is taken and complex AM or FM signal model 
is applied according to section 4.2. All the 11 phonemes given in table 4.2 
could be faithfully regenerated after fitting them into the complex AM or 



FM model and estimating the parameters. The original and regenerated 
signals together with various intermediate steps are explained for phonemes 
‘n’, ‘dx’, ‘ix’ and ‘s’. 

For phoneme ‘n’ the PSD plot is shown in figure (4,6) .by seeing the figure 
it is clear that most of the energy is in low frequency region so it can be 
fitted by complex AM model. The gain for this phoneme signal is estimated 
by a Hanning window of length £ = 140. The gain sequence is fitted by a 
polynomial of order 4, Coefficients of the polynomial are given in table 
(4.4). The original and regenerated gain is shown in figures (4.3) and (4.4) 
respectively. The original signal of phoneme is divided by gain sequence 
and resultant signal is shown in figure (4.5(a)). This signal obtained after 
division of gain is fitted by complex AM model. And all parameters are 
determined as discussed in chapter 2. After estimation of parameters signal 
is regenerated and shown in figure (4.5(b)). 

Finally the regenerated signal shown in figure (4.5(b)) is nnilliplicd with 
regenerated gain shown in figure (4.4), to get original phoneme signal of 
varying gain as shown in figure (4.1), 

Various estimated parameter values for the regenerated signal for phoneme 
‘n’ are shown in table (4.3), 

The PSD plots for the phonemes ‘dx’ and ‘ix’ are shown in figures ^ 

Tl ^ 



(4.12) and (4.18) respectively , It is apparent from the plot that these 
phonemes can also be fitted by complex AM signal model. The Hanning 
window length for estimation of gain of phoneme ‘dx’ is L = 140 while for 
phoneme ‘ix’ length is taken to be 150. The gain sequence of phoneme ‘dx’ 
is fitted by polynomial of model order 5, while that for ‘ix’ model order 
chosen is 6. Various parameters for these two phonemes are estimated in the 
same way as for phoneme ‘n’. Final original and regenerated signals and 
various intermediate figures for phoneme ‘dx’ are plotted in figures 

(4.7) , (4.8) ,(4.9).(4.l0),and (4.11) 

Final and regenerated signals and various intermediate figures for phoneme 
‘ix’ are plotted in. figures (4.13), (4.14), (4.15), (4.16), and (4.17) 

Estimated parameters of phoneme ‘dx’ and ‘ix’ are shown in table (4.5) and 

(4.7) respectively. The coefficients of polynomial for phoneme ‘dx’ and ‘ix’ 
are shown in tables (4.6) and (4.8) respectively. 

From the PSD plot of phoneme ‘s’ as shown in figure (4.24 ) ,it is clear 
that most of the energy lies in high frequency region . To fit this signal 
, complex FM signal model is used. The gain is estimated with Hanning 
window of length L = 300, and fitted by polynomial of model order 8. The 
product function sequence is then fitted into a linear prediction model. 
The AR model order chosen is greater than 180. Using the zeros of PEF, 
and making clusters as explained in section (3.2), frequency parameters of 



the model are determined. Then using these frequencies all other parameters 
of the model are determined employing the method discussed in section 
(3.2). 

Final regenerated signal and original signal with varying gain for this 
phoneme are shown in figures (4.19) and (4.20) respectively. Various other 
intermediate plots are shown in figures (4.21), (4.22), and (4.23) 

Estimated parameters of the phoneme and polynomial coefficients for the 
gain are listed in tables (4.9) and (4.10) 







Fig. 4. 3: The Original gain function of phoneme sound ‘n’ 



Fig.4.4: The regenerated gain function of phoneme sound ‘n’ 










* 



Fig. 4.8: The regenerated phoneme sound ‘dx’ 
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Fig, 4. 12: The PSD plot of phoneme sound ‘dx’ 
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Fig. 4. 18; The PSD plot of phoneme sound ‘ix’ 
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Fig. 4.24; The PSD plot of phoneme sound ‘s’ 
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Table. 4.1: Separated words with respect to sample No. 


SAMPLE n 

WORD 

2260 to 4600 

don’t 

4600 to 8640 

ask 

8640 to 9520 

me 

9520 to 10736 

to 


Table. 4. 2: The phonetic Transcription 


SAMPLE # 

PHONEME 

2260 

to 

2730 

d 

2730 

to 

4120 

uh 

4120 

to 

4600 

n 

4600 

to 

6864 

ao 

6864 

to 

7920 

s 

7920 

to 

8270 

kcl 

8270 

to 

8640 

k 

8640 

to 

8856 

m 

8856 

to 

9520 

ix 

9520 

to 

9960 

dx 

9960 

to 

10736 

ix 
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Table 4.3: Estimated parameters for phoneme sound TT 



0) 

V 


Amp (A) 

Phase (</)) 

1. 

+ 1408 

±2924 

2.2989 


±2.2449 

2. 

±5714 

±1427 

1.3315 

0.2463 

±4.2662 

3. 

±12804 

±1380 

1.9458 

0.1144 

±3.7867 

4. 

±15649 

±6770 

0.4805 


±0.7324 

5. 

±1433 

- 

- 

0.7193 

±1.3445 

6. 

±2876 

1 

- 

1.5741 

±6.1514 


Table. 4.4: Polynomial coefficients for gain function of phoneme sound ‘n’ 


P\ 

Pi 

Pi 

Pa 

Pi 

-l.lOSOxlO"®® 

1.0491 

^ 10-006 

-0.0003 

0.0390 

-2.4542 
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Table.“4.5: Estiitiated parameters for phoneme sound ‘dx’ 


n 

(0 

V 


Amp (A) 

Phase (<ji) 

1 . 

±1380 

±2641 

0.1503 

1.4753 

±1.3789 

2. 

±5379 

±1342 

1.4655 

0.1388 

±0.0888 

3. 

±11832 

±232 

1.1430 

0.1468 

±4.2269 

4. 

±17264 

±410 

1.1512 

0.0874 

±5.4351 

5. 

±2754 

i 

- 

0.6329 

±3.9300 


Table .4.6: Polynomial coefficients for gain function of phoneme sound 

‘dx’ - 


Pi 

Pi 

Pi 

Pa 

Pi 

Pi 


3.2055 

^ 10-009 

-1,1448x10''^ 

-0.0003 

0.0743 

-3.2021 


















Table. 4. 7; Estimated parameters for phoneme sound Mx’ 



0) 

V 


Amp (A) 

mmmim 

1. 

±1449 

±1424 

1.1573 

0.1267 

±1.4196 

2. 

±5835 

±1405 

0.8056 

0.1483 

±3.7086 

3. 

±11487 

±1417 

0.7755 

1.1179 

±4.7595 

4. 

±3007 

- 

- 

0.4439 

±3.4297 

5. 

±4358 

- 

- 

0.5874 

±4.1344 


Table. 4. 8: Polynomial coefficients for gain function of phoneme sound 'ix' 


P^ 

P2 



Pi 

P6 

Pi 

-5.008 xlO"'”^ 

1.1707 

xlO"'”' 

-9.6643x10'“” 

4.1041 

xiQ-w 

-0.0009 

0.0785 

-1.1776 


























Table. 4. 9: Estimated parameters for phoneme sound ‘s’ 


■ u 

Q) 

V 

/? 

Amp (A) 

Phase (^) 

1. 

±24619 


0.70 

0.5868 

±2,4613 

2. 

±27162 


8.40 

0.4000 

±2.7800 

3. 

±36918 


6.70 

0.2610 

±5.9942 

4. 

±39711 


0.10 

0.1572 

+ 0.5916 

5. 

± 47254 


- 

1.1028 

±3.2031 


Table. 4. 10: Polynomial coefficients for gain function of phoneme sound ‘s’ 


Pi 

Pi 

Pi 

Pa 

Ps 

Pe 

Pi 

Pi 

Po 

2.92 

xlO"®^' 

-1.39x10-"” 

2.69 

xlO'" 

-2.67x10-°" 

DO 

o 

X 

-4.20x10"°°' 

0.0006 

-0.0292 

1 . 

-0.690( 
























4.4 Quantization of Parameters 


The quantization of various estimated parameters is explained 
in this section. For the purpose of explanation parameters of 
phoneme sound ‘n’ are taken. 

By seeing the table (4.3) we find that total of 26 parameter values 
are required to synthesize the phoneme ‘n’. Before quanitizing the 
frequencies q),s and v,s are divided by 10000 and. After that 

mantissa and exponent parts are separately quantized. Mantissa part 
of each frequency can be quantized by « 6 bits. There are total of 10 
frequencies, hence, « 60 bits are required to quantize the mantissa 
part of all the frequencies. 6 bits can quantize the exponent part. By 
this way total of 66 bits can quantize all the frequencies. 

There are 4 modulation index parameters, 6 amplitude 
parameters and 6 phase parameters and to quantize each of these « 6 
bits are required which make a total sum of 16x6= 96 bits to 
quantize all these parameters. sWtKsH fWH WttJK 

r . liiiAB 

Polynomial coefficients for gain function of phoneme sound ‘n’ are 
shown in table (4.4). Here 1 bit is required for sign check, » 6 bits 
are required to quantize mantissa part and «4 bits are suffice to 
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quantize the exponent part thus one coefficient can be quantized by 
wlO bits. There are 5 such coefficients, which can be quantized by 
w55 bits. 

Thus we see that total number of bits required to quantize all the 
parameters of phoneme sound ‘n’ by simple uniform quanitization is 
w 66+96+55, which is equal to 217 bits. And time interval of 
phoneme ‘n’ is 30ms. So coding rate turns out to be « 7.2 Kbps. 

In the similar way coding rate for other phoneme parameters can be 
approximately calculated. Here the method of quantization is very 
crude and is used only to check the potential of data compression .If 
other optimal methods of quantization like Vector quantization etc. 
are applied then more reduction in. coding rate is anticipated. 




Chapter 5 


Conclusion 


The phonemes having energy in low frequency region are well 
modeled by Complex AM, while phonemes having most of the energy in 
high frequency region are best modeled by complex FM model. 

Also in this method of speech coding the parameters remain same for 
entire duration of time and approach is not frame based. Which reduces 
many complexities. 

Due to change of parameter set at the phoneme boundary, there arises a 
problem of continuity while regenerating speech signal and further work in 
this direction is desirable. 
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